Activity 2 - UNITED KINGDOM ROAD ACCIDENT RECORDS REPORT
Data Analyst : Jomel Tomeo
Initialization of Core Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import folium
from folium.plugins import HeatMap
from scipy.stats import f_oneway
import warnings
warnings.filterwarnings('ignore')
Loading Dataset(s) into DataFrames
Requirements:
- DataFrame Identifier
- Dataset File Location
accident = pd.read_csv("datasets\\uk_road_accident.csv")
accident
| Index | Accident_Severity | Accident Date | Latitude | Light_Conditions | District Area | Longitude | Number_of_Casualties | Number_of_Vehicles | Road_Surface_Conditions | Road_Type | Urban_or_Rural_Area | Weather_Conditions | Vehicle_Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 200701BS64157 | Serious | 5/6/2019 | 51.506187 | Darkness - lights lit | Kensington and Chelsea | -0.209082 | 1 | 2 | Dry | Single carriageway | Urban | Fine no high winds | Car |
| 1 | 200701BS65737 | Serious | 2/7/2019 | 51.495029 | Daylight | Kensington and Chelsea | -0.173647 | 1 | 2 | Wet or damp | Single carriageway | Urban | Raining no high winds | Car |
| 2 | 200701BS66127 | Serious | 26-08-2019 | 51.517715 | Darkness - lighting unknown | Kensington and Chelsea | -0.210215 | 1 | 3 | Dry | NaN | Urban | NaN | Taxi/Private hire car |
| 3 | 200701BS66128 | Serious | 16-08-2019 | 51.495478 | Daylight | Kensington and Chelsea | -0.202731 | 1 | 4 | Dry | Single carriageway | Urban | Fine no high winds | Bus or coach (17 or more pass seats) |
| 4 | 200701BS66837 | Slight | 3/9/2019 | 51.488576 | Darkness - lights lit | Kensington and Chelsea | -0.192487 | 1 | 2 | Dry | NaN | Urban | NaN | Other vehicle |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 660678 | 201091NJ14695 | Fatal | 21-10-2022 | 58.445475 | Darkness - lights lit | Highland | -3.065535 | 1 | 1 | Wet or damp | Single carriageway | Rural | Fine no high winds | Car |
| 660679 | 201091NJ14695 | Fatal | 21-10-2022 | 58.445475 | Darkness - lights lit | Highland | -3.065535 | 1 | 1 | Wet or damp | Single carriageway | Rural | Fine no high winds | Car |
| 660680 | 201091NJ14695 | Fatal | 21-10-2022 | 58.445475 | Darkness - lights lit | Highland | -3.065535 | 1 | 1 | Wet or damp | Single carriageway | Rural | Fine no high winds | Car |
| 660681 | 201091NJ14695 | Fatal | 21-10-2022 | 58.445475 | Darkness - lights lit | Highland | -3.065535 | 1 | 1 | Wet or damp | Single carriageway | Rural | Fine no high winds | Car |
| 660682 | 201091NJ14695 | Fatal | 21-10-2022 | 58.445475 | Darkness - lights lit | Highland | -3.065535 | 1 | 1 | Wet or damp | Single carriageway | Rural | Fine no high winds | Car |
660683 rows × 14 columns
Descriptive Analytics
accident.describe()
| Latitude | Longitude | Number_of_Casualties | Number_of_Vehicles | |
|---|---|---|---|---|
| count | 660658.000000 | 660657.000000 | 660683.000000 | 660683.000000 |
| mean | 52.553845 | -1.431256 | 1.356970 | 1.831181 |
| std | 1.406684 | 1.383305 | 0.824734 | 0.715285 |
| min | 49.914430 | -7.516225 | 1.000000 | 1.000000 |
| 25% | 51.490690 | -2.332472 | 1.000000 | 1.000000 |
| 50% | 52.315646 | -1.411916 | 1.000000 | 2.000000 |
| 75% | 53.453473 | -0.232870 | 1.000000 | 2.000000 |
| max | 60.757544 | 1.762010 | 68.000000 | 32.000000 |
accident.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Latitude | 660658.0 | 52.553845 | 1.406684 | 49.914430 | 51.490690 | 52.315646 | 53.453473 | 60.757544 |
| Longitude | 660657.0 | -1.431256 | 1.383305 | -7.516225 | -2.332472 | -1.411916 | -0.232870 | 1.762010 |
| Number_of_Casualties | 660683.0 | 1.356970 | 0.824734 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 68.000000 |
| Number_of_Vehicles | 660683.0 | 1.831181 | 0.715285 | 1.000000 | 1.000000 | 2.000000 | 2.000000 | 32.000000 |
Checking Null Values
accident.isnull().sum()
Index 0 Accident_Severity 0 Accident Date 0 Latitude 25 Light_Conditions 0 District Area 0 Longitude 26 Number_of_Casualties 0 Number_of_Vehicles 0 Road_Surface_Conditions 726 Road_Type 4519 Urban_or_Rural_Area 15 Weather_Conditions 14127 Vehicle_Type 0 dtype: int64
Filling Up Null Values
accident['Latitude'] = accident['Latitude'].fillna(accident['Latitude'].mean())
accident['Longitude'] = accident['Longitude'].fillna(accident['Longitude'].mean())
accident['Road_Surface_Conditions'] = accident['Road_Surface_Conditions'].fillna("unaccounted")
accident['Road_Type'] = accident['Road_Type'].fillna("unaccounted")
accident['Urban_or_Rural_Area'] = accident['Urban_or_Rural_Area'].fillna(accident['Urban_or_Rural_Area'].mode()[0])
accident['Weather_Conditions'] = accident['Weather_Conditions'].fillna("unaccounted")
accident.isnull().sum()
Index 0 Accident_Severity 0 Accident Date 0 Latitude 0 Light_Conditions 0 District Area 0 Longitude 0 Number_of_Casualties 0 Number_of_Vehicles 0 Road_Surface_Conditions 0 Road_Type 0 Urban_or_Rural_Area 0 Weather_Conditions 0 Vehicle_Type 0 dtype: int64
Categorical Data
accident['Accident_Severity'] = accident['Accident_Severity'].astype('category')
accident['Light_Conditions'] = accident['Light_Conditions'].astype('category')
accident['Road_Surface_Conditions'] = accident['Road_Surface_Conditions'].astype('category')
accident['Road_Type'] = accident['Road_Type'].astype('category')
accident['Urban_or_Rural_Area'] = accident['Urban_or_Rural_Area'].astype('category')
accident['Weather_Conditions'] = accident['Weather_Conditions'].astype('category')
accident['Vehicle_Type'] = accident['Vehicle_Type'].astype('category')
accident.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 660683 entries, 0 to 660682 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Index 660683 non-null object 1 Accident_Severity 660683 non-null category 2 Accident Date 660683 non-null object 3 Latitude 660683 non-null float64 4 Light_Conditions 660683 non-null category 5 District Area 660683 non-null object 6 Longitude 660683 non-null float64 7 Number_of_Casualties 660683 non-null int64 8 Number_of_Vehicles 660683 non-null int64 9 Road_Surface_Conditions 660683 non-null category 10 Road_Type 660683 non-null category 11 Urban_or_Rural_Area 660683 non-null category 12 Weather_Conditions 660683 non-null category 13 Vehicle_Type 660683 non-null category dtypes: category(7), float64(2), int64(2), object(3) memory usage: 39.7+ MB
Clearning any Inconsistencies with the Dataset
accident['Accident Date'] = accident['Accident Date'].str.strip()
accident['Accident Date'] = accident['Accident Date'].astype('str')
accident['Accident Date'] = accident['Accident Date'].str.replace('/', '-')
accident['Accident Date'] = pd.to_datetime(accident['Accident Date'], dayfirst = True, errors='coerce')
Extracting Date Information Using Pandas DateTime
accident['Year'] = accident['Accident Date'].dt.year
accident['Month'] = accident['Accident Date'].dt.month
accident['Day'] = accident['Accident Date'].dt.day
accident['DayOfWeek'] = accident['Accident Date'].dt.dayofweek # Monday = 0, Sunday = 6
accident.isnull().sum()
Index 0 Accident_Severity 0 Accident Date 0 Latitude 0 Light_Conditions 0 District Area 0 Longitude 0 Number_of_Casualties 0 Number_of_Vehicles 0 Road_Surface_Conditions 0 Road_Type 0 Urban_or_Rural_Area 0 Weather_Conditions 0 Vehicle_Type 0 Year 0 Month 0 Day 0 DayOfWeek 0 dtype: int64
DATA EXPLORATION
Conducting data exploration by structuring and analyzing twenty-five (25) key questions
Below are the twenty-five (25) questions that establish the structure of the analytical process
To optimize traffic police patrols, on which day of the week do most accidents occur?
Has there been a year-over-year trend in accident rates? Are our safety measures working?
What percentage of accidents causes at least one serious or fatal injury?
On average, how many people get hurt in each accident, and does this number change when more vehicles are involved?
Is there a specific combination of light and weather conditions that is particularly dangerous?
In each region, do more accidents happen in urban areas or rural areas?
During rainy weather, which type of vehicle is most often involved in accidents?
For our annual review, for each year, what was the district with the highest number of accidents? Has the worst-performing district changed?
When are motorcycle accidents most likely to happen during the week?
Which month of the year has the highest frequency of accidents?
Are there geographic clusters of severe accidents, and is accident severity related to latitude and longitude?
Do accidents on weekends correlate with higher severity than accidents on weekdays?
Which road type has the strongest correlation with higher casualty counts?
Do accidents involving cars, motorcycles, and HGVs differ in the number of other vehicles involved?
Do multiple vehicles accidents result in more casualties compared to single-vehicle accidents?
How do accident rates fluctuate month-by-month? Is there a predictable seasonal pattern?
Is the problem getting better or worse in rural areas compared to urban ones?
Do locations with more accidents also tend to have more casualties on average?
For accidents involving a bus, what is the relationship with the number of other vehicles?
What are the most common weather conditions during accidents on roundabouts?
Which light conditions are most often associated with fatal accidents?
Which districts have the highest ratio of serious accidents to slight accidents?
What are the top 10 districts with the poorest reporting of road surface conditions?
How many vehicles are usually involved in accidents on dual carriageways?
What is the distribution of the number of casualties in accidents that occur in darkness?
Where should we focus fatal accident prevention programs?
Do most accidents occur during daylight or nighttime hours in high-risk districts?
What are the most common weather conditions during accidents in Kensington and Chelsea District?
Where are the districts with highest accident rates during December holidays?
Which districts experience the most accidents where taxis or private hire vehicles are involved?
Which districts have highest accident rates on dry road surfaces?
Question 1
To optimize traffic police patrols, on which day of the week do most accidents occur?
# Count accidents per day of week (0=Mon, 6=Sun)
counts = accident['DayOfWeek'].value_counts().reindex(np.arange(7), fill_value=0)
# Label the days
counts.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
print(counts)
Mon 72662 Tue 94537 Wed 99541 Thu 99502 Fri 97994 Sat 107162 Sun 89285 Name: count, dtype: int64
INSIGHTS: Based on the analysis of accident frequency by day of the week, Saturday is the day with the highest volume of accidents, totaling 107,162 incidents . Therefore, to optimize traffic police patrols for maximum visibility and rapid response, the strategic focus should be prioritized on Saturdays.
Question 2
Has there been a year-over-year trend in accident rates? Are our safety measures working?
accidents_by_year = accident['Year'].value_counts().sort_index()
print(accidents_by_year)
Year 2019 182115 2020 170591 2021 163554 2022 144423 Name: count, dtype: int64
INSIGHTS: Yes, our safety measures are working. Every year, the number of accidents has gone down. We started with 182,115 accidents in 2019 and ended with 144,423 in 2022. This steady drop shows our efforts to improve safety are making a real difference .
Question 3
What percentage of accidents causes at least one serious or fatal injury?
percentage = ((accident['Accident_Severity'] == 'Fatal') | (accident['Accident_Severity'] == 'Serious')).mean() * 100
print(f"{percentage:.2f}% of accidents involved a fatal or serious injury.")
14.68% of accidents involved a fatal or serious injury.
INSIGHTS: About 14.68% of accidents cause a serious or fatal injury. This means that for every 100 accidents, almost 15 have severe outcomes. Most accidents are minor, but this number shows that dangerous crashes still happen often. This is important to know so we can focus on making roads safer.
Question 4
On average, how many people get hurt in each accident, and does this number change when more vehicles are involved?
overall_avg = round(accident['Number_of_Casualties'].mean(), 2)
avg_by_vehicles = accident.groupby('Number_of_Vehicles')['Number_of_Casualties'].mean().round(2)
overall_avg, avg_by_vehicles
(np.float64(1.36), Number_of_Vehicles 1 1.17 2 1.37 3 1.71 4 2.00 5 2.32 6 2.61 7 3.06 8 3.40 9 3.35 10 3.63 11 4.00 12 2.29 13 7.83 14 5.44 15 5.00 16 6.00 19 13.00 28 16.00 32 5.00 Name: Number_of_Casualties, dtype: float64)
INSIGHTS: On average, 1.36 people get hurt in each accident. This number does change when more vehicles are involved. For example, accidents with just one vehicle have about 1.17 casualties, while those with five vehicles have about 2.32. More vehicles usually lead to more people hurt. This makes sense because more cars means more people are at risk in a single crash.
Question 5
Is there a specific combination of light and weather conditions that is particularly dangerous?
# Count accidents for each combination of light and weather conditions
danger = (accident.groupby(['Light_Conditions', 'Weather_Conditions']).size().sort_values(ascending=False).head(10))
danger
Light_Conditions Weather_Conditions
Daylight Fine no high winds 398654
Darkness - lights lit Fine no high winds 92045
Daylight Raining no high winds 49738
Darkness - no lighting Fine no high winds 24859
Darkness - lights lit Raining no high winds 22664
Daylight Other 10096
unaccounted 10042
Darkness - no lighting Raining no high winds 6205
Daylight Fine + high winds 5790
Raining + high winds 4938
dtype: int64
INSIGHTS: Yes, there is a particularly dangerous combination. The most accidents, almost 400,000, happened in daylight during fine weather with no high winds. The second most frequent, about 92,000 accidents, also happened in fine weather but during darkness when lights were on. This shows that many accidents happen in good weather, both in the daytime and at night. It means drivers may feel too confident when the weather is clear, which leads to more accidents.
Question 6
In each region, do more accidents happen in urban areas or rural areas?
# Top 5 districts by number of accidents
top_5_districts = accident['District Area'].value_counts().head(5).index
# Urban vs Rural counts for top 5 districts
accident[accident['District Area'].isin(top_5_districts)].groupby(['District Area', 'Urban_or_Rural_Area']).size()
District Area Urban_or_Rural_Area
Birmingham Rural 134
Unallocated 0
Urban 13357
Bradford Rural 796
Unallocated 0
Urban 5416
Leeds Rural 1774
Unallocated 0
Urban 7124
Manchester Rural 143
Unallocated 0
Urban 6577
Sheffield Rural 462
Unallocated 0
Urban 5248
dtype: int64
INSIGHTS: In every one of the top five districts, many more accidents happen in urban areas than in rural areas. For instance, in Birmingham, there were 13,357 urban accidents but only 134 rural ones. This difference is the same for all the other top districts like Leeds, Manchester, and Sheffield, where urban accident numbers are always much higher. Therefore, for these areas, the answer is that more accidents happen in urban places.
Question 7
During rainy weather, which type of vehicle is most often involved in accidents?
# Top 5 vehicle types in rain accidents
top_5_vehicles = accident.loc[accident['Weather_Conditions'].str.lower().str.contains('rain'), 'Vehicle_Type'].value_counts().head(5)
top_5_vehicles
Vehicle_Type Car 67134 Van / Goods 3.5 tonnes mgw or under 4711 Bus or coach (17 or more pass seats) 3571 Motorcycle over 500cc 3488 Goods 7.5 tonnes mgw and over 2387 Name: count, dtype: int64
INSIGHTS: When the weather is rainy, cars are the vehicle type most often involved in accidents. Cars were involved in 67,134 accidents that occurred in the rain. The next highest vehicle type was vans, which were involved in 4,711 accidents. The most likely reason for this is that there are far more cars driving on the roads than any other kind of vehicle.
Question 8
For our annual review, for each year, what was the district with the highest number of accidents? Has the worst-performing district changed?
# Count accidents per district per year
counts = accident.groupby(['Year', 'District Area']).size().reset_index(name='Accidents')
# Get the district with the most accidents each year
worst_district_by_year = counts.loc[counts.groupby('Year')['Accidents'].idxmax()]
worst_district_by_year
| Year | District Area | Accidents | |
|---|---|---|---|
| 24 | 2019 | Birmingham | 3820 |
| 438 | 2020 | Birmingham | 3506 |
| 852 | 2021 | Birmingham | 3289 |
| 1268 | 2022 | Birmingham | 2876 |
INSIGHTS: Birmingham was the district with the highest number of accidents every year from 2019 to 2022. The number of accidents in Birmingham decreased each year, from 3,820 in 2019 to 2,876 in 2022. Even though the total number went down over time, Birmingham remained the worst-performing district. This means the district with the most accidents did not change during this period.
Question 9
When are motorcycle accidents most likely to happen during the week?
# Filter motorcycle accidents
motorcycle_accidents = accident[accident['Vehicle_Type'].str.lower().str.contains('motorcycle')]
# Count accidents by day of the week
accidents_by_day = motorcycle_accidents['DayOfWeek'].value_counts().sort_index()
# Label the days
accidents_by_day.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
accidents_by_day
Mon 6122 Tue 8102 Wed 8404 Thu 8597 Fri 8245 Sat 9131 Sun 7576 Name: count, dtype: int64
INSIGHTS: Motorcycle accidents are most likely to happen on a Saturday, with 9,131 incidents. The number of accidents is also high throughout the week, from Tuesday to Friday. Sunday has the lowest number of motorcycle accidents for the week. This may tell us that riding is riskier on weekends and weekdays compared to Sundays.
Question 10
Which month of the year has the highest frequency of accidents?
accident['Month'] = accident['Accident Date'].dt.month
# Count accidents by month
accidents_by_month = accident['Month'].value_counts().sort_index()
accidents_by_month
Month 1 52852 2 49482 3 54084 4 51741 5 56348 6 56474 7 57439 8 53909 9 56450 10 59673 11 60409 12 51822 Name: count, dtype: int64
INSIGHTS: November has the highest number of accidents for the year, with 60,409 incidents. October is the next highest month with 59,673 accidents. The months with the fewest accidents are February and April. The number of accidents generally increases from the start of the year towards the end. The peak in late autumn might be because of darker evenings and more rain.
Question 11
Are there geographic clusters of severe accidents, and is accident severity related to latitude and longitude?
# Check if location is correlated with the number of people hurt.
lat = accident['Latitude'].corr(accident['Number_of_Casualties'])
lon = accident['Longitude'].corr(accident['Number_of_Casualties'])
print(f"Correlation Latitude vs. Casualties: {lat:.3f}")
print(f"Correlation Longitude vs. Casualties: {lon:.3f}")
Correlation Latitude vs. Casualties: 0.032 Correlation Longitude vs. Casualties: -0.040
INSIGHTS: The correlation values between location and accident severity are very weak. Latitude has a small positive correlation of 0.032 with the number of casualties. Longitude has a small negative correlation of -0.040. This means there is no strong relationship between geographic coordinates and how severe an accident is.
Question 12
Do accidents on weekends correlate with higher severity than accidents on weekdays?
accident['Is_Weekend'] = (accident['DayOfWeek'] >= 5).astype(int) # Assuming Mon=0, Sun=6
correlation = accident['Is_Weekend'].corr(accident['Number_of_Casualties'])
print(f"Correlation between Weekend and Number of Casualties: {correlation:.3f}")
Correlation between Weekend and Number of Casualties: 0.021
INSIGHTS: No, accidents on weekends are not more severe than accidents on weekdays. It shows almost no connection between the day of the week and how many people get hurt. The number of casualties in an accident is nearly the same whether it happens on a weekday or a Saturday or Sunday.
Question 13
Which road type has the strongest correlation with higher casualty counts?
road_type_severity = accident.groupby('Road_Type')['Number_of_Casualties'].mean()
print(road_type_severity.sort_values(ascending=False).head(3))
Road_Type Dual carriageway 1.477298 Slip road 1.423661 Single carriageway 1.344578 Name: Number_of_Casualties, dtype: float64
INSIGHTS: Dual carriageways have the strongest link to higher casualty counts, with an average of about 1.48 casualties per accident. Slip roads are next with 1.42 casualties, followed by single carriageways with 1.34. This means accidents on faster, multi-lane roads like dual carriageways tend to result in more people getting hurt. The road type is related to the severity of the accidents that happen on it.
Question 14
Do accidents involving cars, motorcycles, and HGVs differ in the number of other vehicles involved?
car = accident[accident['Vehicle_Type'] == 'Car']['Number_of_Vehicles']
bike = accident[accident['Vehicle_Type'].str.contains('Motorcycle')]['Number_of_Vehicles']
hgv = accident[accident['Vehicle_Type'].str.contains('Goods')]['Number_of_Vehicles']
f_oneway(car, bike, hgv)
F_onewayResult(statistic=np.float64(1.4670124971711498), pvalue=np.float64(0.23061422663996944))
INSIGHTS: No, there is no significant difference in the number of vehicles involved in accidents between cars, motorcycles, and large trucks. The high p-value of 0.23 confirms that any small differences are not meaningful and are likely due to random chance. The type of vehicle does not affect how many other vehicles are in the accident.
Question 15
Do multiple vehicles accidents result in more casualties compared to single-vehicle accidents?
single = accident[accident['Number_of_Vehicles'] == 1]['Number_of_Casualties']
multiple = accident[accident['Number_of_Vehicles'] > 1]['Number_of_Casualties']
f_oneway(single, multiple)
F_onewayResult(statistic=np.float64(15030.438521505124), pvalue=np.float64(0.0))
INSIGHTS: Yes, accidents involving multiple vehicles result in more casualties compared to single-vehicle accidents. The statistical test shows a p-value of 0.0, which means this difference is really important and not due to chance. When more than one vehicle is involved in a collision, the average number of people injured is higher. This is likely because these accidents involve more people overall.
Question 16
How do accident rates fluctuate month-by-month? Is there a predictable seasonal pattern?
accidents_by_month = accident['Month'].value_counts().sort_index()
plt.plot(accidents_by_month.index, accidents_by_month.values, marker='o')
plt.title('Seasonality of Accidents (Monthly Trend)')
plt.xlabel('Month')
plt.ylabel('Number of Accidents')
plt.xticks(range(1,13))
plt.grid(True)
plt.show()
INSIGHTS: Yes, accident rates show a seasonal pattern. They are highest in the winter, around December and January, and lowest in the summer. This is likely because winter weather makes roads more dangerous. The trend is very predictable, repeating every year.
Question 17
Is the problem getting better or worse in rural areas compared to urban ones?
rural_trend = accident[accident['Urban_or_Rural_Area'] == 'Rural'].groupby('Year').size()
urban_trend = accident[accident['Urban_or_Rural_Area'] == 'Urban'].groupby('Year').size()
plt.plot(urban_trend.index, urban_trend.values, marker='o', label='Urban')
plt.plot(rural_trend.index, rural_trend.values, marker='s', label='Rural')
plt.title('Trend of Accidents: Urban vs. Rural')
plt.xlabel('Year')
plt.ylabel('Number of Accidents')
plt.legend()
plt.grid(True)
plt.show()
INSIGHTS: Yes, the problem is getting worse in rural areas. Urban accidents are going down, but rural accidents are going up. This means the gap between rural and urban safety is growing.
Question 18
Do locations with more accidents also tend to have more casualties on average?
# Group by rounded lat and long
cluster_stats = accident.groupby([accident['Latitude'].round(2), accident['Longitude'].round(2)])
# Count and mean in one step
counts = cluster_stats['Index'].count()
casualties = cluster_stats['Number_of_Casualties'].mean()
# Scatter plot
plt.scatter(counts, casualties, alpha=0.5)
plt.title('Accidents vs Average Casualties by Location')
plt.xlabel('Accidents')
plt.ylabel('Average Casualties')
plt.show()
INSIGHTS: No. The number of accidents at a location does not affect how severe they are. A place with many crashes has the same average casualties as a place with few crashes.
Question 19
For accidents involving a bus, what is the relationship with the number of other vehicles?
bus_accidents = accident[accident['Vehicle_Type'].str.contains('Bus', na=False)]
plt.scatter(bus_accidents['Number_of_Vehicles'], bus_accidents['Number_of_Casualties'], alpha=0.3, s=20, color='purple')
plt.title('Bus Accidents: Vehicles vs Casualties')
plt.xlabel('Vehicles Involved')
plt.ylabel('Casualties')
plt.show()
INSIGHTS: For bus accidents, more vehicles usually mean more casualties. The scatter plot shows a clear upward trend. When a bus is involved in a crash with more vehicles, the number of casualties tends to be higher. This indicates that multi-vehicle collisions are generally more severe.
Question 20
What are the most common weather conditions during accidents on roundabouts?
roundabout_weather = accident[accident['Road_Type'] == 'Roundabout']['Weather_Conditions'].value_counts().head(8)
plt.bar(roundabout_weather.index, roundabout_weather.values, color='lightblue')
plt.title('Weather in Roundabout Accidents')
plt.xlabel('Weather')
plt.ylabel('Accidents')
plt.xticks(rotation=90)
plt.show()
INSIGHTS: Most accidents on roundabouts occur in fine weather with no strong winds. The second most common condition is rainy weather without high winds. This tells us that driver error in normal conditions may be a bigger factor than bad weather.
Question 21
Which light conditions are most often associated with fatal accidents?
fatal_light = accident[accident['Accident_Severity'] == 'Fatal']['Light_Conditions'].value_counts().head(6)
plt.bar(fatal_light.index, fatal_light.values, color='black')
plt.title('Fatal Accidents by Light Condition')
plt.xlabel('Light Condition')
plt.ylabel('Fatal Accidents')
plt.xticks(rotation=90)
plt.show()
INSIGHTS: Most fatal accidents happen in daylight. Darkness with street lights is also common, which means fatal crashes are not just caused by poor visibility.
Question 22
Which districts have the highest ratio of serious accidents to slight accidents?
counts = accident.groupby(['District Area', 'Accident_Severity']).size().unstack(fill_value=0)
severity_ratio = ((counts['Serious'] + counts['Fatal']) / counts['Slight']).nlargest(15)
plt.barh(severity_ratio.index, severity_ratio.values, color='darkred')
plt.title('Top Districts: Serious+Fatal vs Slight Accidents')
plt.xlabel('Ratio')
plt.show()
INSIGHTS: Selly, Maldon, and Purbeck have the highest ratio of serious to slight accidents. This means crashes there are more likely to be severe. These districts have more dangerous roads.
Question 23
What are the top 10 districts with the poorest reporting of road surface conditions?
outlier_districts = (
accident[accident['Road_Surface_Conditions'] == 'unaccounted']
.groupby('District Area')
.size()
/ accident.groupby('District Area').size() * 100
)
top10 = outlier_districts.nlargest(10)
plt.barh(top10.index, top10.values, color="grey")
plt.title('Top 10 Districts with Most "Unaccounted" Road Surface Data')
plt.xlabel('Percentage (%)')
plt.show()
INSIGHTS: Lincoln has the worst reporting for road surface data. North Kesteven and Gedling are also poor. These areas often list road conditions as "unaccounted." This missing data makes safety analysis difficult there.
Question 24
How many vehicles are usually involved in accidents on dual carriageways?
dual_accidents = accident[accident['Road_Type'] == 'Dual carriageway']
plt.hist(dual_accidents['Number_of_Vehicles'], bins=7, color='red')
plt.title('Vehicles in Dual Carriageway Accidents')
plt.xlabel('Vehicles Involved')
plt.ylabel('Count')
plt.grid(True, alpha=0.3)
plt.show()
INSIGHTS: Most accidents on dual carriageways involve 2 vehicles. This is by far the most common number. The next most common are 1-vehicle and then 3-vehicle accidents.
Question 25
What is the distribution of the number of casualties in accidents that occur in darkness?
darkness_accidents = accident[accident['Light_Conditions'].str.startswith('Darkness', na=False)]
plt.figure(figsize=(10, 6))
plt.hist(darkness_accidents['Number_of_Casualties'], bins=20, color='midnightblue', edgecolor='white')
plt.title('Distribution of Casualties in Nighttime Accidents')
plt.xlabel('Number of Casualties')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)
plt.show()
INSIGHTS: Most accidents in darkness have only 1 casualty. The vast majority involve very few people. Severe crashes with many casualties at night are rare.
Question 26
Where should we focus fatal accident prevention programs?
fatal_by_district = accident[accident['Accident_Severity'] == 'Fatal']['District Area'].value_counts().head(10)
plt.pie(fatal_by_district.values, labels=fatal_by_district.index, autopct='%1.1f%%')
plt.title('Top 10 Districts with Fatal Accidents')
plt.show()
INSIGHTS: Birmingham is the most important place to start because 22.7% of all fatal accidents happen there, which is much higher than any other district. The next most critical areas are Leeds (12.6%), Highland (11.2%), and East Riding of Yorkshire (10.2%). We should focus on these four districts to address more than half of all fatal accidents.
The districts at the bottom of the list, like Powys (6.7%) and Bradford (7.1%) have fewer fatal accidents. This means prevention programs will have a much smaller impact there.
Question 27
Do most accidents occur during daylight or nighttime hours in high-risk districts?
top3 = accident['District Area'].value_counts().head(3).index
# Plot
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
[accident[accident['District Area'] == d]['Light_Conditions'].value_counts().head(6).plot.pie(autopct='%1.1f%%', ax=axes[i], title=f'{d} District') for i, d in enumerate(top3)]
plt.suptitle('Light Conditions by District')
plt.tight_layout()
plt.show()
# Print
print("Light Conditions by District:\n" + "="*40)
[print(f"\n{d}:\n{accident[accident['District Area'] == d]['Light_Conditions'].value_counts().head(6)}") for d in top3]
Light Conditions by District: ======================================== Birmingham: Light_Conditions Daylight 9667 Darkness - lights lit 3672 Darkness - lighting unknown 82 Darkness - lights unlit 51 Darkness - no lighting 19 Name: count, dtype: int64 Leeds: Light_Conditions Daylight 6482 Darkness - lights lit 1992 Darkness - lighting unknown 251 Darkness - no lighting 162 Darkness - lights unlit 11 Name: count, dtype: int64 Manchester: Light_Conditions Daylight 4610 Darkness - lights lit 1878 Darkness - lighting unknown 202 Darkness - no lighting 18 Darkness - lights unlit 12 Name: count, dtype: int64
[None, None, None]
INSIGHTS:The three highest-risk districts are Birmingham, Leeds, and Manchester. In Birmingham, 9,667 accidents in daylight compared to 3,824 in darkness. Similarly, in Leeds, there were 6,482 daytime accidents versus 2,416 at night, and in Manchester, 4,610 occurred in daylight compared to 2,110 in darkness.
Accident prevention programs should focus mainly on daytime conditions, since most incidents occur during the day, though nighttime safety remains important.
Question 28
What are the most common weather conditions during accidents in Kensington and Chelsea District?
district_name = accident['District Area'].unique()[0]
district_data = accident[accident['District Area'] == district_name]
plt.figure(figsize=(10, 10))
weather_impact = district_data['Weather_Conditions'].value_counts().head(8)
plt.pie(weather_impact.values, labels=weather_impact.index, autopct='%1.1f%%')
plt.title(f'Weather Conditions for Accidents in {district_name} District')
plt.show()
print(f"Weather Conditions for Accidents in {district_name} District")
total_accidents = weather_impact.sum()
for condition, count in weather_impact.items():
percentage = (count / total_accidents) * 100
print(f"{condition}: {percentage:.1f}%")
Weather Conditions for Accidents in Kensington and Chelsea District Fine no high winds: 84.7% Raining no high winds: 12.3% Other: 1.8% Fine + high winds: 0.5% Snowing no high winds: 0.4% Raining + high winds: 0.3% unaccounted: 0.1% Fog or mist: 0.0%
INSIGHTS: Kensington and Chelsea District are concentrated under a single weather condition. Fine weather with no high winds, which accounts for 84.7% of all accidents. The next most common condition, Raining with no high winds, is far less frequent at 12.3%. All other weather conditions, including high winds, fog, or snow are less than 3% of incidents.
Question 29
Where are the districts with highest accident rates during December holidays?
holiday_data = accident[accident['Month'] == 12]
# Print district information
december_districts = holiday_data['District Area'].value_counts()
print("Accidents per district in December holidays:")
print(december_districts)
m8 = folium.Map(location=[55.3781, -3.4360], zoom_start=6)
HeatMap(holiday_data[['Latitude', 'Longitude']].dropna(), radius=12).add_to(m8)
m8
Accidents per district in December holidays:
District Area
Birmingham 1097
Leeds 702
Manchester 513
Bradford 505
Sheffield 484
...
Western Isles 10
Berwick-upon-Tweed 10
Teesdale 9
Orkney Islands 8
Clackmannanshire 5
Name: count, Length: 422, dtype: int64
INSIGHTS: Birmingham is the main hotspot for December holiday accidents with 1,097 cases and needs urgent safety action. Leeds (702 accidents) and Manchester (513 accidents) are next in priority. These three areas should get focused December safety measures like more traffic patrols, road safety campaigns, and public reminders since they have the most holiday accidents.
Question 30
Which districts experience the most accidents where taxis or private hire vehicles are involved?
taxi_accidents = accident[accident['Vehicle_Type'].str.contains('Taxi|Private hire', na=False)]
taxi_districts = taxi_accidents['District Area'].value_counts()
print("Taxi accidents per district:")
print(taxi_districts)
m14 = folium.Map(location=[55.3781, -3.4360], zoom_start=6)
HeatMap(taxi_accidents[['Latitude', 'Longitude']].dropna(), radius=15).add_to(m14)
m14
Taxi accidents per district:
District Area
Birmingham 504
Westminster 219
Glasgow City 142
Leeds 135
Kensington and Chelsea 132
...
North Shropshire 2
Blaeu Gwent 2
Clackmannanshire 2
Chester-le-Street 1
London Airport (Heathrow) 1
Name: count, Length: 421, dtype: int64
INSIGHTS: Birmingham has the most taxi-related accidents with 504 incidents. Westminster is a distant second with 219 accidents, followed by Glasgow City (142), Leeds (135), and Kensington and Chelsea (132). Taxi and private hire vehicles should be prioritized in Birmingham first, then the focus can expand to other high-incident districts.
Question 31
Which districts have lowest accident rates on dry road surfaces?
dry_roads = accident[accident['Road_Surface_Conditions'] == 'Dry']
dry_districts = dry_roads['District Area'].value_counts()
print("Dry roads accidents by district:")
print(dry_districts)
m19 = folium.Map(location=[55.3781, -3.4360], zoom_start=6)
HeatMap(dry_roads[['Latitude', 'Longitude']].dropna(), radius=12).add_to(m19)
m19
Dry roads accidents by district:
District Area
Birmingham 9367
Leeds 6397
Westminster 4596
Manchester 4493
Bradford 4047
...
Teesdale 85
Western Isles 78
Clackmannanshire 56
Shetland Islands 49
Orkney Islands 49
Name: count, Length: 422, dtype: int64